in-context memory
Position: Episodic Memory is the Missing Piece for Long-Term LLM Agents
Pink, Mathis, Wu, Qinyuan, Vo, Vy Ai, Turek, Javier, Mu, Jianing, Huth, Alexander, Toneva, Mariya
As Large Language Models (LLMs) evolve from text-completion tools into fully fledged agents operating in dynamic environments, they must address the challenge of continually learning and retaining long-term knowledge. Many biological systems solve these challenges with episodic memory, which supports single-shot learning of instance-specific contexts. Inspired by this, we present an episodic memory framework for LLM agents, centered around five key properties of episodic memory that underlie adaptive and context-sensitive behavior. With various research efforts already partially covering these properties, this position paper argues that now is the right time for an explicit, integrated focus on episodic memory to catalyze the development of long-term agents. To this end, we outline a roadmap that unites several research directions under the goal to support all five properties of episodic memory for more efficient long-term LLM agents.
Assessing Episodic Memory in LLMs with Sequence Order Recall Tasks
Pink, Mathis, Vo, Vy A., Wu, Qinyuan, Mu, Jianing, Turek, Javier S., Hasson, Uri, Norman, Kenneth A., Michelmann, Sebastian, Huth, Alexander, Toneva, Mariya
Current LLM benchmarks focus on evaluating models' memory of facts and semantic relations, primarily assessing semantic aspects of long-term memory. However, in humans, long-term memory also includes episodic memory, which links memories to their contexts, such as the time and place they occurred. The ability to contextualize memories is crucial for many cognitive tasks and everyday functions. This form of memory has not been evaluated in LLMs with existing benchmarks. To address the gap in evaluating memory in LLMs, we introduce Sequence Order Recall Tasks (SORT), which we adapt from tasks used to study episodic memory in cognitive psychology. SORT requires LLMs to recall the correct order of text segments, and provides a general framework that is both easily extendable and does not require any additional annotations. We present an initial evaluation dataset, Book-SORT, comprising 36k pairs of segments extracted from 9 books recently added to the public domain. Based on a human experiment with 155 participants, we show that humans can recall sequence order based on long-term memory of a book. We find that models can perform the task with high accuracy when relevant text is given in-context during the SORT evaluation. However, when presented with the book text only during training, LLMs' performance on SORT falls short. By making it possible to evaluate more aspects of memory, we believe that SORT will aid in the emerging development of memory-augmented models. Large language models (LLMs) have impressive performance on many benchmarks that test factual or semantic knowledge learned during training or in-context (Hendrycks et al., 2020; Ryo et al., 2023; Logan IV et al., 2019; Petroni et al., 2019; Yu et al., 2023; Sun et al., 2023). While these advances are noteworthy, the type of long-term knowledge that these datasets test is only one of several types that naturally intelligent systems store, retrieve, and update continuously over time (Norris, 2017; Izquierdo et al., 1999; McClelland et al., 1995). Current evaluation tasks do not assess episodic memory, which is a form of long-term knowledge thought to be important for cognitive function in humans and animals. In contrast to semantic memory, episodic memory links memories to their contexts, such as the time and place they occurred.